Skip to content

Conversation

@snowmead
Copy link
Contributor

@snowmead snowmead commented Nov 18, 2025

MSPs retry storage requests in the event when they fail due to proof invalidations i.e. ForestProofVerificationFailed, KeyProofVerificationFailed, FailedToApplyDelta. Under these circumstances, the MSP will re-queue the storage requests to be responded to again.

Design

BatchProcessStorageRequests is a new event handler which queries the pending storage requests which are waiting for the MSP's response. It will cause; a NewStorageRequest event to be emitted via a new command (PreprocessStorageRequest) to initiate the natural storage request process which receives the file from the user and adds the file to storage and queue the storage request for response.

The file keys that are processed and responded to are tracked via a shared HashMap in a single MspUploadFileTask instance. This single instance is critical for the retry mechanism to work correctly, as all event handlers (BatchProcessStorageRequests, NewStorageRequest, RemoteUploadRequest, ProcessMspRespondStoringRequest) must share the same file_key_statuses HashMap to see status updates from each other.

A new FileKeyStatus enum tracks each file key's processing state across concurrent event handlers:

  • Processing - File key is in the pipeline
  • Accepted - Successfully accepted on-chain
  • Rejected - Explicitly rejected on-chain
  • Abandoned - Failed with non-proof dispatch errors (permanent failures, won't be retried)

Retry Mechanism: When proof errors occur (ForestProofVerificationFailed, FailedToApplyDelta), file keys are removed from the file_key_statuses HashMap rather than being marked with a Failed status. This signals to the next BatchProcessStorageRequests cycle that the file key should be re-processed. The file key will be re-inserted with Processing status, triggering a new NewStorageRequest event and regenerating proofs with the updated forest root.

The batch processing cycle is executed atomically controlled via a semaphore (size 1). The blockchain service will emit this event at every block if the semaphore isn't held. This allows more linear control over the process of when the MSP will preprocess storage requests and queue them up for responding.

The MspHandler tracks storage requests which are queued in an in-memory FIFO queue (pending_respond_storage_requests) with O(1) deduplication via a HashSet (pending_respond_storage_request_file_keys) to prevent duplicate queueing of the same file key.

Storage Deprecation

Deprecates Column Families from RocksDb storage:

  • pending_msp_respond_storage_request
  • pending_msp_respond_storage_request_left_index
  • pending_msp_respond_storage_request_right_index

Will be removed once we implement proper migration automations for RockDb.

Notable changes

  • Adds a new ModuleError associated type to the StorageEnableRuntime to enable type safety when reading decoding extrinsic errors into specific pallet errors
  • New integration test which verifies the MSP can retry failed storage request acceptance based on invalid proof errors explicitly
  • Use batchStorageRequests helper api for integration tests that create multiple storage requests
  • Fixes a bug with conditional checks in batchStorageRequests helper api and adds logging for observability sake
  • Add .mcp.json to .gitignore

Simplified Dataflow

flowchart TD
    A[BatchProcessStorageRequests<br/><i>periodic, from BlockchainService</i>] --> B[NewStorageRequest<br/><i>file 1</i>]
    A --> C[NewStorageRequest<br/><i>file 2</i>]
    A --> D[NewStorageRequest<br/><i>file N</i>]
    B --> E{File in storage?}
    C --> E
    D --> E
    E -->|No| F[RemoteUploadRequest<br/><i>chunk uploads from user</i>]
    E -->|Yes| G[on_file_complete]
    F -->|file complete| G
    G --> H[queue_msp_respond_storage_request<br/><i>in-memory FIFO queue</i>]
    H --> I[ProcessMspRespondStoringRequest<br/><i>batched extrinsic submission</i>]
    I -->|Proof Error| J[Remove from file_key_statuses<br/><i>triggers retry on next cycle</i>]
    I -->|Success| K[Accepted/Rejected status]
    I -->|Non-proof Error| L[Abandoned status<br/><i>permanent skip</i>]
Loading

@snowmead snowmead added B0-silent Changes should not be mentioned in any release notes D3-trivial👶 PR contains trivial changes that do not require an audit not-breaking Does not need to be mentioned in breaking changes labels Nov 18, 2025
snowmead and others added 24 commits November 19, 2025 08:21
- Refactored the storage request submission process in `move-bucket.test.ts` to utilize a batch processing helper
…Postgres DB (#563)

* fix: 🩹 Remove and update old comments

* feat: 🚧 Add `blockchain-service-db` crate and postgres schema

* feat: 🚧 Wire pending tx DB updating to `send_extrinsic`

* feat: 🚧 Add CLI param for db URL, initialise DB on BS startup and update status with watcher

* fix: 🐛 Use i64 for nonce

* feat: ✨ Add pending tx postgres to integration test suite

* fix: 🐛 Wire CLI pending db param to blockchain service initialisation

* test: ✅ Fix passing CLI pending db param in test suites

* feat: ✨ Clear pending txs from DB in finality

* docs: 📝 Document functions in `store.rs` for pending DB

* feat: ✨ Log when a pending tx has a state update but for a different tx hash

* fix: 🐛 Initialise Blockchain Service last processed blocks with genesis

* test: ✅ Fix tests using old indexer db container name

* test: ✅ Add back backend container initialisation

* fix: 🐛 Remove duplicate indexer nodes

* test: ✅ Fix name change mistakenly

* test: ✨ Add new pending DB testing utilities

* fix: 🗑️ Remove deprecated `createApiObject`

* test: ✅ Add persistent pending Tx DB integration tests

* feat: 🚧 Add `load_account_with_states` query to enable re-subscription at startup

* feat: ✨ Re-watch transactions pending after restarting MSP

* test: ✅ Add test for not re-watching extrinsic with nonce below on-chain nonce

* feat: ✨ Add `watched` boolean field to pending db

* feat: ✨ Persist gap filling remark transactions

* fix: ✅ Fix race condition where container wasn't fully paused

* feat: ✨ Add pendingDbUrl option to `addBspContainer` as well

* refactor: 🚚 Rename `insert_sent` to `upsert_sent`

* feat: 🔥 Remove unused `load_active` function from pending db interface

* refactor: ♻️ Use `Vec<String>` directly in `load_resubscribe_rows` params

* feat: 🩹 Remove usage of `sent` state in pending DB

* refactor: ♻️ Use `TransactionStatus` in db interface functions

* fix: 🐛 Track transaciton in `transaction_manager` even with no `call_scale`

* refactor: ♻️ Use constants for container names of DBs and backend

* test: ✅ Add check after sealing block of pending tx not updating

* feat: ✨ Add message in remark fake tx

* fix: ✅ Use new constant instead of hardcoded postgres container name

* fix: 🐛 Resubscribe to pending txs in initial sync handling istead of startup

* fix: 🐛 Set all txs pending to `watched=false` then only those we re-watch back to `true`

* Revert "fix: 🐛 Resubscribe to pending txs in initial sync handling istead of startup"

This reverts commit df6af95.

* fix: 🐛 Try to watch in_block pending transacitons too

* fix: 🐛 Not filter by on-chain nonce when re-subscribing to pending txs

* test: ✅ Improve test error logging

* feat: ✨ Add custom error to submit_and_watch_extrinsic

* fix: 🩹 Log and skip when error in re-subscribing is old nonce

* fix: ✅ Consider node race condition in test
- Consolidate capacity management with single increase per batch
- Add batch trimming to fit within capacity limits
- Implement batch rejection with single extrinsic for efficiency
- Extract helper methods for file metadata construction
- Improve logging for batch processing visibility
- Clean up imports and remove unused file_key_cleanup field
- retry block sealing for checking msp acceptance
Add `msp: Option<(ProviderIdFor<T>, bool)>` field to the NewStorageRequest
event to propagate MSP assignment information through events. This allows
MSP clients to determine if a storage request was created for them and
whether they have already accepted it, without needing to query storage
request metadata from the chain. Prevents the MSP from reaccepting storage requests
The MSP could queue the same file key multiple times for acceptance,
causing MspAlreadyConfirmed errors when batches processed duplicate
entries. This occurred when multiple code paths (BatchProcessStorageRequests
and RemoteUploadRequest handlers) both called on_file_complete for the
same file.

Add a persistent HashSet (CFHashSetAPI) alongside the existing deque to
track pending file keys. Before queueing, check if the file key exists
in the set - skip if present, insert and queue if not. When popping from
the deque for batch processing, remove the file key from the set.
@snowmead snowmead changed the title feat(msp): msp batch respond storage requests per block feat(msp): msp preprocesses storage requests in batches per block Nov 26, 2025
@snowmead snowmead requested a review from ffarall November 26, 2025 18:56
snowmead and others added 5 commits December 1, 2025 08:56
This file contains local MCP server configuration and should not be tracked in the repository.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
snowmead and others added 18 commits December 1, 2025 10:37
…up after fisherman resume

  The `waitForIndexing` helper was using `system.number()` which returns the
  best block (e.g., #28), but the indexer only indexes finalized blocks (e.g.,
  #22). This caused a 30-second timeout waiting for "Indexing block #28" log
  that wouldn't appear until finalization caught up.
* refactor: 🚚 Move release docs to `resources`

* refactor: 🚚 Move config files to dedicated directory

* fix: 📝 Minor fix in release process doc

* refactor: 🚚 Move `backend_config` as well

* fix: 🩹 Remove extra lines in config file
* fix: 🚑 refactor upload code to avoid write lock deadlocks

* fix: 🔊 update logs to print file key and fingerprint as hexadecimal strings

* fix: 🐛 correctly error out when a file inconsistency gets detected

* revert: ⏪ revert changes to file storage
* fix: 🐛 avoid concurrent uploads for the same file key in the backend

* fix: ⏪ undo change in file diesel table
…d-storage-requests

# Conflicts:
#	client/src/tasks/msp_upload_file.rs
Resolved conflict in client/src/tasks/msp_upload_file.rs:
- Kept HEAD's handle_rejected_storage_request logic for proper on-chain rejection
- Updated return type to anyhow::Result<String> per EventHandler trait change from main
- Updated BatchProcessStorageRequests handler to return Result<String>
@snowmead snowmead requested a review from TDemeco December 5, 2025 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B0-silent Changes should not be mentioned in any release notes D3-trivial👶 PR contains trivial changes that do not require an audit not-breaking Does not need to be mentioned in breaking changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants